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This paper considers the efficient estimation of copula-based semi- 
parametric strictly stationary Markov models. These models are char- 
acterized by nonparametric invariant (one-dimensional marginal) dis- 
tributions and parametric bivariate copula functions where the cop- 
ulas capture temporal dependence and tail dependence of the pro- 
cesses. The Markov processes generated via tail dependent copulas 
may look highly persistent and are useful for financial and economic 
applications. We first show that Markov processes generated via Clay- 
ton, Gumbel and Student's t copulas and their survival copulas are all 
geometrically ergodic. We then propose a sieve maximum likelihood 
estimation (MLE) for the copula parameter, the invariant distribu- 
tion and the conditional quantiles. We show that the sieve MLEs 
of any smooth functional is root-n consistent, asymptotically nor- 
mal and efficient and that their sieve likelihood ratio statistics are 
asymptotically chi-square distributed. Monte Carlo studies indicate 
that, even for Markov models generated via tail dependent copulas 
and fat-tailed marginals, our sieve MLEs perform very well. 

1. Introduction. A copula function is a multivariate probability distribu- 
tion function with uniform marginals. A copula-based method has become 
one popular tool for modeling nonlinearity, asymmetricality and tail de- 
pendence in financial and insurance risk managements. See Embrechts, Mc- 
Neil and Straumann (2002), McNeil, Frey and Embrechts (2005), Embrechts 
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(2009), Genest, Gendron and Bourdeau-Brien (2008), Patton (2002, 2006, 
2008) and the references therein for reviews of various theoretical properties 
and financial applications of the copula approach. 

While the majority of the previous work using copulas has focused on 
modeling the contemporaneous dependence between multiple univariate se- 
ries, there are also a growing number of papers using copulas to model the 
temporal dependence of a univariate nonlinear time series. Granger (2003) 
defines persistence (such as "long memory" or "short memory") for general 
nonlinear time series models via copulas. Darsow, Nguyen and Olsen (1992), 
de la Peha, Ibragimov and Sharakhmetov (2006) and Ibragimov (2009) pro- 
vide characterizations of a copula-based time series to be a Markov process. 
Joe (1997) proposes a class of parametric (strictly) stationary Markov mod- 
els based on parametric copulas and parametric invariant (one-dimensional 
marginal) distributions. Chen and Fan (2006) study a class of semiparamet- 
ric stationary Markov models based on parametric copulas and nonparamet- 
ric invariant distributions. 

Let {Yt} be a stationary Markov process of order one with a continuous 
invariant (one-dimensional marginal) distribution G. Then its probabilis- 
tic properties are completely determined by the bivariate joint distribution 
function of It-i and Yt, H{yi,y2) (say). By Sklar's theorem [see McNeil, 
Frey and Embrechts (2005), Nelsen (2006)], one can uniquely express H{-, •) 
in terms of the invariant distribution G and the bivariate copula function 
C{;-) ofYt.i and Yt, 

H{yr,y2)^C{G{yi),G{y2)). 

Thus one can always specify a stationary first-order Markov model with 
continuous state space by directly specifying the marginal distribution of Yt 
and the bivariate copula function of Yt-i and Yt. The advantage of the copula 
approach is that one can freely choose the marginal distribution and the 
bivariate copula function separately; the former characterizes the marginal 
behavior such as the fat-tails and/or skewness of the time series 
while the latter characterizes all the temporal dependence properties that 
are invariant to any increasing transformations as well as the tail dependence 
properties of the time series. Although being strictly stationary first-order 
Markov, a model generated via a copula (especially a tail-dependent copula) 
is very flexible. This model can generate a rich array of nonlinear time series 
patterns, including persistent clustering of extreme values via tail dependent 
copulas evaluated at fat-tailed marginals, asymmetric dependence, and other 
"look alike" behaviors present in many popular nonlinear models such as 
ARCH, GARCH, stochastic volatility, near-unit root, long-memory, models 
with structural breaks, Markov switching and so on. From the point of view 
of financial applications, one attractive property of the copula-based Markov 
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model is that the imphed conditional quantiles are automatically monotonic 
across quantiles. This nice feature has been exploited by Chen, Koenker and 
Xiao (2008) and Bouye and Salmon (2008) in their study of copula-based 
nonlinear quantile autoregression and value at risk (VaR). 

In this paper, we shall focus on the class of copula-based, strictly sta- 
tionary, semiparametric first-order Markov models, in which the true copula 
density function has a parametric form (c(-,-;ao))) and the true invariant 
distribution is of an unknown form (Go(-)) but is absolutely continuous 
with respect to the Lebesgue measure on the real line. Any model of this 
class is completely described by two unknown characteristics: the copula 
dependence parameter ckq and the invariant distribution Gq{-). To establish 
the asymptotic properties of any semiparametric estimators of {ao,Go), one 
needs to know temporal dependence properties of the copula-based Markov 
models. For this class of models, Chen and Fan (2006) show that the P- 
mixing temporal dependence measure is purely determined by the proper- 
ties of the copulas (and does not depend on the invariant distributions); and 
Beare (2008) provides simple sufficient conditions for geometric /3-mixing 
in terms of copulas without any tail dependence [such as Gaussian, Frank 
and Eyraud-Farlie-Gumbel-Morgenstern (EFGM) copulas]. Neither paper 
is able to verify whether or not a Markov process generated via a tail depen- 
dent copula (such as Clayton, survival Clayton, Gumbel, survival Gumbel 
or Student's t) is geometric /3-mixing. Ibragimov and Lentzas (2008) demon- 
strate via simulation that Clayton copula-based first-order strictly station- 
ary Markov models could behave as "long memory" in copula levels. In this 
paper, we show that Clayton, survival Clayton, Gumbel, survival Gumbel 
and Student's t copula-based Markov models are actually geometrically er- 
godic (hence geometric /3- mixing). Therefore, according to our theorem, al- 
though a time series plot of a Clayton copula (or survival Clayton, Gumbel, 
survival Gumbel or other tail-dependent copula) generated Markov model 
may look highly persistent and "long memory alike," it is, in fact, weakly 
dependent and "short memory." 

In this paper, we propose a sieve maximum likelihood estimation (MLE) 
procedure for the copula parameter ao, the invariant distribution Go and the 
conditional quantiles of a copula-based semiparametric Markov model. This 
procedure approximates the unknown marginal density by flexible paramet- 
ric families of densities with increasing complexity (sieves), and then maxi- 
mizes the joint likelihood with respect to the unknown copula parameter and 
the sieve parameters of the approximating marginal density. We show that 
the sieve MLEs of any smooth functionals of {ao,Go) are root-n consistent, 
asymptotically normal and efficient; and that their sieve likelihood ratio 
statistics are asymptotically chi-square distributed. We also present simple 
consistent estimators of asymptotic variances of the sieve MLEs of smooth 
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functionals. It is interesting to note that although the conditional distribu- 
tion of a copula-based semiparametric stationary Markov model depends on 
the unknown invariant distribution, the plug-in sieve MLE estimators of the 
nonlinear conditional quantiles (VaR) are still i/n-consistent, asymptotically 
normal and efficient. 

To the best of our knowledge, Atlason (2008) is the only other paper that 
also considers the semiparametric efficient estimation of a copula parameter 
ao for a copula-based first-order strictly stationary Markov model. His work 
and ours were done at the same time, but independently. While we propose 
the sieve likelihood joint estimation of Go and oq, Atlason (2008) proposes 
the rank likelihood estimation of the copula parameter ao, and relies on a 
simulation method to evaluate his rank likelihood. However, Atlason (2008) 
does not investigate the semiparametric efficient estimation of the invariant 
distribution Go nor the conditional quantiles. 

Previously, Chen and Fan (2006) proposed a simple two-step estimation 
procedure in which one first estimates the invariant CDF Go(-) by a re-scaled 
empirical CDF G„ of the data {ItjJL]^, and then estimates the copula pa- 
rameter tto by maximizing the pseudo log-likelihood corresponding to copula 
density evaluated at pseudo observations {G„(l()}"^|. Chen and Fan's pro- 
cedure can be viewed as an extension of the one proposed by Genest, Ghoudi 
and Rivest (1995) for a bivariate copula-based joint distribution model of a 
random sample {{Xi,Yi)}2^i to a univariate first-order Markov model of a 
time series data (with Xi = li_i). Both are semiparametric analogs 

of the two-step parametric procedure that is called the "inference functions 
for margins" (IFM) in Joe (1997), Chapter 10. Just as the two-step estimator 
of Genest, Ghoudi and Rivest (1995) is generally inefficient for a bivariate 
random sample [see, e.g., Genest and Werker (2002)], the two-step estimator 
of Chen and Fan (2006) is inefficient for a univariate Markov model. 

We present Monte Carlo studies to compare the finite sample perfor- 
mance of our sieve MLE, the two-step estimator of Chen and Fan (2006), 
the correctly specified parametric MLE and the incorrectly specified para- 
metric MLE for Clayton, Gumbel, Gaussian, Frank and EFGM copula-based 
Markov models. Numerous simulation studies demonstrate that the two-step 
estimator of Chen and Fan (2006) is not only inefficient but also severely bi- 
ased (in finite sample) when the time series has strong tail dependence, and 
it leads to a biased and inefficient plug-in estimator of conditional quantiles 
(or VaR). The simulation results indicate that our sieve MLEs perform very 
well; when the copula-based Markov process has strong tail dependence, 
the sieve MLEs have much smaller biases and smaller variances than the 
two-step estimators. 

The rest of this paper is organized as follows. In Section 2, we present 
the class of copula-based semiparametric strictly stationary Markov models 
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and show that many widely used tail dependent copula-based Markov mod- 
els are geometrically /3-mixing. In Section 3, we introduce the sieve MLE, 
and obtain its consistency and rate of convergence. Section 4 establishes 
the asymptotic normality and semiparametric efficiency of the sieve MLE. 
Section 5 shows that their sieve likelihood ratio statistics are asymptotically 
chi-square distributed which suggests a simple way to construct confidence 
regions for the copula parameter and other smooth functionals. In Section 6, 
we first briefly review some popular existing estimators. We then conduct 
some simulation studies to compare the finite sample performance of our 
sieve MLE and these alternative estimators. Section 7 briefly concludes. All 
the proofs are relegated to the Appendix. 

Finally, we wish to point out that given the characterization results of 
Darsow, Nguyen and Olsen (1992) and Ibragimov (2009) on higher order 
Markov models via copulas, we can easily extend our sieve MLE method 
and our results for copula-based first-order Markov models to copula-based 
higher order Markov models. For presentational clarity we do not give the 
details here. 

2. Copula-based Markov models. In this section, we first present the 
model and then some implied temporal dependence properties. 

2.1. The model. Darsow, Nguyen and Olsen (1992) provide characteriza- 
tion of first-order Markov processes by bivariate copulas and one-dimensional 
marginal distributions (see Nelsen [(2006), Section 6.4] for a brief review). 
Throughout this paper, we assume that the true data generating process 
(DGP) satisfies the following assumption: 

Assumption M. (DGP): (1) {Yi : t = 1, . . . , n} is a sample of a strictly 
stationary first-order Markov process generated from (Go(-), C(-, •; c^o)) where 
Go(-) is the true invariant distribution that is absolutely continuous with 
respect to the Lebesgue measure on the real line (with its support y, a 
nonempty interval of 7^); C(-, •; ckq) is the true parametric copula for (Yt-i,Yt) 
up to unknown value ao and is absolutely continuous with respect to the 
Lebesgue measure on [0,1]^. (2) The true marginal density go(") of Go(-) is 
positive on its support y; and the true copula density c(-, •; ^o) of C(-, •; oo) 
is positive on (0, 1)^. 

In Assumption M(l), the assumption of absolute continuity of the bi- 
variate copula C(-, soo) rules out the Frechet-Hoeffding upper {C{ui,U2) = 
min(ni,U2)) and the lower (C(t(i, ti2) = max(ni + n2 — 1, 0)) bounds, as well 
as their linear combinations [and, say, shuffles and Min copulas discussed in 
Darsow, Nguyen and Olsen (1992)]. 
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Under Assumption M(l), the true conditional probability density func- 
tion, p^{-\Y^~^) of Yt given = {Yt_i,. . . , Yi), is given by 

(2.1) p\-\Y'-') = ho{-\Yt^i) = go{-)c{Go{Yt^i),Go{-);ao), 

where /io(-|lt-i) denotes the true conditional density of Yt, given It-i- 
Under Assumption M(l), the transformed process {Ut'-Ut = Go(5^t)}"=i is 
also a strictly stationary first-order Markov process with uniform marginals 
and C(-,-;ao)) the joint distribution of Ut-i and Ut- Then C2|i[-|n; oq] = 
■^C{u,-;ao) = Ci{u,-;ao) is the conditional distribution of Ut = Go(Yt), 
given Ut^i = u; and C^]^[g|ti; ao] is the gth, q G (0,1), conditional quantile 
of Ut, given Ut-i = u. 

Note that the conditional density of Yt, given Y^~^ , is a function of both 
the copula density c(-,-;ao) and the marginal density g^; hence the gth, 
q G (0, 1), the conditional quantile of Yj given y*~i is also a function of both 
the copula and the marginal 

(2.2) Ql{y) = G^\C~^l[q\G^{y);a^]). 

By definition, C^|[g|?x; qq] is increasing in q; hence the gth conditional quan- 
tile of Yt given Y^~^ , Q^{y), is also increasing in g. 

2.2. Tail dependence, temporal dependence. All the dependence mea- 
sures that are invariant under increasing transformations can be expressed 
in terms of copulas [see, e.g., McNeil, Frey and Embrechts (2005) and Nelsen 
(2006)]. For example, Kendall's tau is r = 4 //jq -|^j2 C(ni, U2) (iC(ui, M2) — Ij 
and Spearman's rho is ps = 12 //^ -^^ G{ui,U2) dui du2 — 3. The lower (resp. 
upper) tail dependence coefficients (resp. Xu) in terms of copulas are 

C(u u) 

Xl = lim Pr([/2 ^ulUi <u) = lim ^ — and 

n-»0+ u-^0+ U 

Xu= lim Pr(C/2>^x|C/i>n)= lim ^ - + Ciu,u) ^ 

u^l~ 1 — U 

provided the limits exist. [See Kortschak and Albrecher (2009) for examples 
of copulas with nonexisting limits for tail dependence and their applications.] 
For financial risk management, the Markov models generated via tail- 
dependent copulas are much more relevant than models without tail depen- 
dence. In particular, the following three examples have been widely used in 
financial applications: 

Example 2.1 (Clayton copula-based Markov model). The bivariate Clay- 
ton copula is 

C{ui,U2,a) = [uj"" + U2 " - 1]"^/", < a < 00. 
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Clayton copula has Kendall's tau r = and the lower tail dependence 
coefficient \l = 2"^/" that is increasing in a, but no upper tail dependence. 
Clayton copula becomes the independence copula Cj{ui,U2) = uiU2 in the 
limit when a — > 0. 

Example 2.2 (Gumbel copula-based Markov model). The bivariate Gum- 
bel copula is 

C{ui,U2]a) = exp(-[(- Inui)" + (- Inua)"]^/"), 1 < a < 00. 

Gumbel copula has Kendall's tau r = 1 — ^, and the upper tail dependence 
coefficient A;/ = 2 - 2^/" that is increasing in a, but no lower tail dependence. 
Gumbel copula becomes the independence copula Ci{ui,U2) = uiU2 in the 
limit when a — > 1 . 

Example 2.3 (Student t copula-based Markov model). The bivariate 
Student t copula is 

C{ui,U2;a) = t„^p{t~^{ui),t^^{u2)), a = (z/,p), |p| < l,zy G (l,oo], 

where tiy^p{-,-) is the bivariate Student-t distribution with mean zero, the 
correlation matrix having off-diagonal element p, and degrees of freedom u, 
and tjy(-) is the CDF of a univariate Student-t distribution with mean zero, 
and degrees of freedom u. Student t copula has Kendall's tau t" = f arcsin p, 
and symmetric tail dependence Xl = = 2ijy+i(— -^Z (u + 1)(1 — p)/{l + p)) 
that is decreasing in u. The Student t copula becomes a Gaussian copula in 
the limit when 1/ — > co. 

2.2.1. Geometric (3-mixing. For analyzing asymptotic properties of any 
semiparametric estimators of (aoiGo), it is convenient to apply empirical 
process results for strictly stationary geometrically ergodic (or geometric /3- 
mixing) Markov processes. See Appendix A for some equivalent definitions 
of /3-mixing and ergodicity for strictly stationary Markov processes. 

Remark 2.1. (1) Under Assumption M, the time series {yt}5L;^ is strictly 
stationary ergodic and is also /3-mixing (see, e.g., Bradley [(2005), Corol- 
lary 3.6] and Chen and Fan (2006)). 

(2) Proposition 2.1 of Chen and Fan (2006) presents high-level sufficient 
(and almost necessary) conditions in terms of a copula to ensure /3-mixing 
decaying either exponentially fast or polynomially fast. Their working-paper 
version points out that their Proposition 2.1 implies the Markov models 
based on Gaussian and EFGM copulas are geometric /3-mixing. However, 
they do not verify whether any other copulas satisfy the conditions of their 
Proposition 2.1. 
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(3) Beare [(2008), Theorem 3.1 and Remark 3.5] shows that all Markov 
models generated via symmetric absolute continuous copulas with posi- 
tive and square integrable copula densities are geometric /3-mixing. In Re- 
mark 3.7, he points out that many commonly used bivariate copulas without 
tail dependence, such as Gaussian, EFGM, Frank, Gamma, binomial and 
hypergeometric copulas, satisfy the conditions of his Theorem 3.1. 

(4) Beare [(2008), Theorem 3.2] shows that all bivariate absolute continu- 
ous copulas with square integrable densities do not have any tail dependence. 
Although he shows that a Markov model based on Student's t copula is rho 
mixing and hence is geometrically strong mixing, Beare (2008) does not ver- 
ify whether a Markov model generated via any tail dependent copula (such 
as Clayton, Gumbel or Student's t copula) is geometrically /3-mixing. 

Ibragimov and Lentzas (2008) demonstrate via simulation that Clayton 
copula generated first-order strictly stationary Markov models behave as 
"long memory" processes in copula levels when the Clayton copula param- 
eter a is big. The time series plots (see Figure 1) of such Markov processes 
appear to be "long memory alike." (See Section 6.2 on how to simulate 
copula-based first-order stationary Markov time series. The clusterings of 
extremes in Figure 1 are due to tail dependence properties of Clayton and 
Gumbel copulas.) Nevertheless, our next theorem shows that they are in 
fact geometrically ergodic and hence they are "short memory" processes. 

Theorem 2.1 (Geometric ergodicity). Under Assumption M, the Markov 
time series {Yt}2=i generated via Clayton copula with < a < oo, Gumbel 
copula with 1 < a < oo, Student's t copula with \p\ < 1 and 2 <v <oo, are 
all geometrically ergodic (and hence geometrically (3-mixing). 

Remark 2.2. If {Ut}^^i is a Cu{-,-) copula generated strictly station- 
ary first-order Markov model with uniform marginals, then {Vt = 1 — C/f 
is also a copula-based strictly stationary first-order Markov model with uni- 
form marginals and bivariate copula function 

Cv{vi,V2) = Pr(T4_i <vi,Vt< V2) = Pr([/t_i >l-vi,Ut>l-V2) 

= Vi + V2-l + Cu{l - Ul, 1 - V2) = Cij{vi,V2) 

which is the survival copula of Cfj{ui^U2) [see Nelsen (2006)]. Therefore, 
a copula Cu{--,-) generated strictly stationary first-order Markov process is 
geometrically ergodic or /^-mixing with certain decay speed [5j = o(l) if and 
only if its survival copula C£r(-, •) generated Markov process is geometrically 
ergodic or /3-mixing with the same decay speed f3j = o(l). 

By Theorem 2.1 and Remark 2.2, we immediately see that survival Clay- 
ton and survival Gumbel generated first-order stationary Markov processes 
are also geometrically ergodic. 
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Fig. 1. Markov time series: tail dependence index =0.9548, Student ti marginal distri- 
bution. 



3. Sieve MLE, consistency with rate. Under Assumption M, we see that 
the true conditional density p^{-\Y^~^) of Yt given = (Yt-i, . . . ,Yi) is 
given by (2.1). Let 

p{-\Y'~') = h{.\Yt_r,a,g)^g{-)c{G{Yt^i),G{-);a) 

denote any candidate conditional density of Yt given F*"^. Let Zt = {Yt-i,Yt), 
and denote 

e{a,g, Zt) = \ogp{Yt\Y'-^) = \og{h{Yt\Yt-i;a,g)} 
= \ogg{Yt) + \ogc{G{Yt^i),G{Yt);a) 

= \ogg{Yt)+\ogcl^j liy<Yt^i)g{y)dy,J liy <Yt)g{y) dy;a 

as the log-likelihood associated with the conditional density p{Yt\Y^~^). Here 
!(•) stands for the indicator function. Then the joint log-likelihood function 
of the data {yj}"=^ is given by 

1 " 1 
Ln{a,g) = - y2i{a,g,Zt) + -logg{Yi). 
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The approximate sieve MLE 7„, = {an,gn) is defined as 



(3.1) 



Ln{an,9n)> max Ln{a, g) - Op{S: 



where 5n is a positive sequence such that 5n = o(l), and Qn denotes the sieve 
space [i.e., a sequence of finite dimensional parameter spaces that become 
dense (as n — > oo) in the entire parameter space Q for qq]. 

There exist many sieves for approximating a univariate probability den- 
sity function. In this paper, we will focus on using linear sieves to directly 
approximate either a square root density: 

n 2 -x 



(3.2) 



E 

.k=l 



akAkiy) 



aKSy)dy = i 



A„ ^ OO, 

n 



or a log density: 

^n = I^Ji-n ^Q--9KSy) =exp|^afeAfc(y)|,y" fi'i^„(y) ^^2/ = l| , 

(3.3) 

OO, — ^0, 

n 

where {Ak{-) : A; > 1} consists of known basis functions, and {a^ : /c > 1} is 
the collection of unknown sieve coefficients. 

Suppose the support y (of the true qq) is either a compact interval (say 
[0, 1]) or the whole real line TZ. Let r > be a real- valued number, and [r] > 
be the largest integer such that [r] < r. A real- valued function 5 on 3^ is said 
to be r-smooth if it is [r] times continuously difFerentiable on y, and its [r]th 
derivative satisfies a Holder condition with exponent r — [r] S (0, 1] (i.e., there 
is a positive number K such that \D^''^g{y) — D^''^g[y')\ < K\y — y'\^~^'^^ for all 
y,y' G 3^- Here Z)''"! stands for the differential operator). We denote A^'{y) 
as the class of all real-valued functions on y which are r-smooth; it is called 
a Holder space. 

Let the true marginal density function gQ satisfy either G A*^' (3^) or 
log go £ A^ (y)- Then any function in A'' (3^) can be approximated by some 
appropriate sieve spaces. For example, if 3^ is a bounded interval and r > 
1/2, it can be approximated by the spline sieve Spl{s,Kn) with s > [r], the 
polynomial sieve, the trigonometric sieve, the cosine series and so on. When 
the support of y is unbounded, thin-tailed density can be approximated 
by a Hermite polynomial sieve, while a polynomial fat-tailed density can 
be approximated by a spline wavelet sieve. See Chen (2007) for detailed 
descriptions of various sieve spaces Qn^ In our simulation study, we choose 
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the sieve number in terms of Kn using a modified AIC, although one could 
also use cross-validation [see, e.g., Fan and Yao (2003), Gao (2007), Li and 
Racine (2007)] and other computationally more intensive model selection 
methods [see, e.g., Shen, Huang and Ye (2004)] to choose the sieve number in 
terms of Kn- See Chen, Fan and Tsyrennikov (2006) for further discussions. 

3.1. Consistency. In the following, we denote Qn{a,g) = ^^£'o[^(a, g', 
Z2)] + ^EQ[logg{Yi)] where £^0 is the expectation under the true DGP (i.e.. 
Assumption M). Denote 7 = (a, g) and 70 = (ckq, 90) & ^ = Ax Q. 

Assumption 3.1. (1) ao e A, where ^ is a compact set of TZ'^ with a 
nonempty interior, c{ui,U2]a) > for all {ui,U2) E (0, 1)^, a G A; (2) go E Q, 
either g = {g = f>0:f £ A'' (3^),/ g{y)dy = 1} and given in (3.2), or 
g = {g = exp(/) > 0:/ G A^'{y), J g{y) dy = 1} and Gn given in (3.3), r > 
1/2; (3) Qnio-Q^go) > —00, there is the metric ||7||c = V oi'a + ||g||c on F = 
A X Q and a positive measurable function t/(-) such that for all e > and 
for all fc > 1 , 

Qn.(ao,5o)- sup Qn{a,g)>r]{e)>0; 

(4) the sieve spaces Gn are compact under the metric \\g\\c', (5) there is Unjo £ 
F„ = ^ X ^„ such that ||n„7o - 7o||c = o(l); and |Qn(n„7o) - Q„(7o)| =o(l). 

For the norm ||7||c = \/ ol'ol + \\g\\c on F = ^ x ^, one can use either a sup 
norm ||(7||oo (or a weighted sup norm) or even a lower order Holder norm 
llS'll^r' for r' £ [0,r) (or its weighted version). 

Assumption 3.2. (1) ii^o[sup^gr„ l^(7)^t)|] is bounded; (2) there is a 
finite constant k > and a measurable function M{-) with EQ[M{Zt)] < 
const. < 00, such that for all 5 > 0, 

sup \e{j,Zt)-e{ji,Zt)\<d''M{Zt) a.s.-Zt. 

{7,7ier„:||7-7i||c<5} 

We note that under Assumption 3.1(1), (4), Assumption 3.2(1) is implied 
by Assumption 3.2(2). 

Proposition 3.1. Under Assumptions M, 3.1 and 3.2, 5n = o{l), Kn 
00 and — — > 0, we have 



||7n-7o||c = Op(l). 



12 



X. CHEN, W. B. WU AND Y. YI 



3.2. Convergence rate. Given the consistency result Proposition 3.1, ipn := 
inf{/i > 0: Pr(||7„ — 7o||c > h) < h}, the Levy distance between ||7„ — 7o||c 
and converges to 0. Let = {7 G F : ||7 — 7o||c < ^n} be the new parame- 
ter space, and the corresponding shrinking neighborhood in the sieve space, 
denoted as Mn = AAn r„, be the new sieve parameter space. Denote Varo as 
the variance under the true DGP (i.e., Assumption M). 

Assumption 3.3. (1) There are metric ||7||s = y/a'a + \\g\\s on M such 
that ||7||s < ||7||ci and a constant Jq > such that for all e > and for all 
n> 1, 

Qn{ao,go) - sup Qnia,g) > JqS^ > 0. 

7GA^n:||70-7lis>^ 

(2) sup|^g_^^.|j^p_^||^<,| Varo(^(7,2't) -£{-fo,Zt)) < const, x for all small 
e>0. 



Assumption 3.3 suggests that a natural choice of ||7||s could be (Qnilo) — 

Qn{l)Y/^. 

Assumption 3.4. (1) {yfjJL]^ is geometrically ergodic (hence geometri- 
cally /3-mixing); (2) there is a constant k G (0,2) and a measurable function 
M(-) with EQ[M{Ztf\og{l + M{Zt))] < const. < 00, such that for any 6>0, 

sup \£{-f,Zt)-e{jo,Zt)\<6''M{Zt) a.s.-Zt. 

{■yeAfn:\\'yo~-y\\s<S} 

Although we do not need any /9-mixing decay rates to establish consis- 
tency in Proposition 3.1, we need some /3-mixing decay rates for rate of con- 
vergence.^ Given the results in Section 2.2.1, Assumption 3.4(1) is typically 
satisfied by copula-based Markov models. Note that in Assumption 3.4(2), 
the moment restriction on the envelop function M(Zt) is weaker than the one 
(£'o[M(Zt)^] < const. < 00 for some C > 2) imposed in Chen and Shen (1998). 
This is because Chen and Shen (1998) only assume /3-mixing with polyno- 
mial decay speed while our Assumption 3.4(1) assumes geometric /3-mixing. 
It is well known that there are trade-offs between speed of mixing decay rate 
and finiteness of moments [see, e.g., Doukhan, Massart and Rio (1995) and 
Nze and Doukhan (2004)]. Assumption 3.4(2) is a very weak regularity con- 
dition and is satisfied whenever sup^g[o^i]^^g_^^.|j^^_^||^<g | '^^("'o+v[-y~'ro],Zt) | < 
5'^M{Zt) with M{Zt) having a finite slightly higher than a second moment. 



^It is common to assume some /3-mixing or strong mixing decay rates in 
semi/nonparametric estimation and testing [see, e.g., Robinson (1983), Andrews (1994), 
Fan and Yao (2003), Gao (2007), Li and Racine (2007)]. 
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which is satisfied by ah the copula-based Markov models that satisfy the 
regularity conditions in Chen and Fan (2006) for semiparametric two-step 
estimators. 

The next proposition is a direct application of Theorem 1 of Chen and 
Shen (1998), hence we omit its proof. 

Proposition 3.2. Under Assumptions M, 3.1-3.4, we have 
||7n - 7o||s = Op('^n), 5n = max|y^^, ||7o - n„,7o||s| =o(l). 

4. Normality and efficiency of sieve MLE of smooth functionals. Let 

p:AxQ^TZhea smooth functional and pi^n) be the plug- in sieve MLE of 
p(7o). In this section, we extend the results of Chen, Fan and Tsyrennikov 
(2006) on root-n normality and efficiency of their sieve MLE for copula- 
based multivariate joint distribution model using i.i.d. data to our scalar 
strictly stationary first-order Markov setting. 

4.1. ^/n- asymptotic normality of p{in)- Recall that 6n is the speed of 
convergence of ||7n — 7o||s to zero in probability, let Mo = {7 € AA: ||7o — 
7IU < ^nlog5~^} and Mon = {7 G Mi : ||7o - 7lU < '^nlogi^;^^, then % gMqu 
with probability approaching one. Also denote {Ui,U2) = (Go(l^i), Go(^2)), 
u = {ui,U2) G [0,1? and c{Go{Yt_i),Go{Yt);ao) = c{U;ao) = c{iQ,Zt) (with 
the danger of slightly abusing notation). 

Assumption 4.1. ao G int(^). 

Assumption 4.2. The second-order partial derivatives 'aaa"'"^ ' 



duj du^ 



for k,j = 1,2, are all well defined and continuous in 



Denote V as the linear span of F — {70}- Under Assumption 4.2, for any 
V = {vajVgY G V, we see that ^(70 -|- r]v,Z) is continuously differentiable 
in r/ G [0,1]. For any 7 G A/q, define the first-order directional derivative of 
£(7, Zt) at the direction u G V as 



d£{j,Zt) d£{-f + 7]v,Zt] 



dj' di] 



dlogcji, Zt) , ^ , VgjYt) 
da' + g{Yt) 

51ogc(7,Z, 



^ dlogcjj, Zt) f 

"2^ d^- / Hy<yt~2+j}vg{y)dy, 

j=i i 
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and the second-order directional derivative as 



97^7' 



\v,v\ = 



d J 9^(7 + rjv, 



d'^£{'y + rjv + r]v,Zt) 



dr] dr] 



r]=0 



ri=0 



ri=0 



Assumption 4.3. (1) < ^o[( ^^y'^ [v]f] < 00 for v ^0,v gV; 

^np^es. I '^^^1^-,;-^°+'^-^ I < 00 and / sup,,5„ | '^^M.I^.-i;7o+..) | ^ 
00 almost surely, for Sv = {rj £ [0, 1] : 70 + ??f G A/q}, v / 0, v € V. 



Assumption 4.3(2) is a condition that is assumed even for parametric 
Markov models similar to those in Joe [(1997), Chapter 10] and Billingsley 
(1961b). 



Lemma 4.1. Under Assumptions M, 3.1(1), (2), 4.I, 4.2 and 4.3, we 
have, for any V £\ , (1) ^o(( ^^y ^ M)( ^^y°^ [v])) = for v £Y and all 
s <t. (2) {^-^^y-^blltLi is a martingale difference sequence with respect to 



the filtration Tt.^=a{Yr,...;Yt.^). (3) Eo{i^^^[v]r) = -Eoi^^^^§^[v,v]). 



Lemma 4.1 suggests that we can define the Fisher inner product on the 
space V as 



{v,v) =Eo 



V dj 



and the Fisher norm for -y G V as \\v\\^ = {v,v). Let V be the closed linear 
span of V under the Fisher norm. Then (V, || • ||) is a Hilbert space. 

The asymptotic properties of p{'^n) depend on the smoothness of the 
functional p and the rate of convergence of For any f G V, we denote 



drj 



ri=0 



dpjlo) 
di 



whenever the limit is well defined. 



Assumption 4.4. (1) For any u G V, /9(7o + rjv) is continuously differ- 
entiable in 77 G [0, 1] near 7? = 0, and 
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(2) there exist constants c > 0, to > 0, and a small e > such that 

< c||f 11"^ for any u G V with \\v\\ < e. 



p{lo + v)- p(7o) - ^^^H 



Under this assumption, by the Riesz representation theorem, there exists 
a f * E V such that 

(4.1) ^P^[^j] = l^ij*^y) foralli;GV 

and 

,* l|2 



\V 



sup \\\\2 ^ 



Assumption 4.5. (1) ||7„ — 7o|| = Op{6n) for a decreasing sequence 6n 
satisfying ((5^)'^ = o(n~^/^); (2) there exists UnV* G r„ — {70} such that 6n x 
WUnV* -v*\\ =o(n-V2). 

Assumption 4.6. For all 7 G Mon with II7 - 7o|| = 0{6n) and ah v = 
{vaiVg)' G V with \\v\\ = 0{5n) we have 



For parametric likelihood models. Assumption 4.6 is automatically satis- 
fied as long as the second-order derivatives of the log-likelihood are continu- 
ous in a shrinking neighborhood of the true parameter value. For sieve MLEs, 
Assumption 4.6 is satisfied provided that the third-order directional deriva- 
ti^gg dH{^,+r,[n],Zt) ^ e [0^ 1]^ ^ e ^ith II7 - 70II = 0{5n), and 

the sieve MLE convergence rate 5n is not too slow. For example, under As- 
sumption 3.1(2) with polynomial, Fourier series, spline or wavelet sieves, we 
have a sieve MLE convergence rate of 5n = n"''/^^''"'"^) [see, e.g., Shen (1997) 
for i.i.d. data, and Chen and Shen (1998) for /J- mixing time series data], and 
hence Assumption 4.6 is satisfied if r > 1. 

Assumption 4.7. { ^^^^f*^ [n^f *] : 7 £ A/q, II7-70II =0{5n)} isaDonsker 
class. 

Under Assumption 3.4(1), Assumption 4.7 is satisfied by applying the 
results of Doukhan, Massart and Rio (1995) on Donsker theorems for strictly 
stationary /J-mixing processes. 

Theorem 4.1 (Normality). Suppose that Assumptions M, 3.1-3.4 and 
4.1-4.7 hold. Then ^{p{%) - p(7o)) N{0, H^f). 
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4.2. Semiparametric efficiency of pi^n) ■ We follow the approach of Wong 
(1992) to establish semiparametric efficiency. Related work can be found in 
Shen (1997), Bickel et al. (1993) and Bickel and Kwon (2001) and the ref- 
erences therein. Recall that a probability family {P^ : 7 S F} for the sample 
{yj}"^;^ is locally asymptotically normal (LAN) at 70, if (1) for any v in the 
linear span of T — {70}, 70 + 'r]n~^^^v G T for all small > 0, and (2) 



Hp. 



(Yi, ...,Yn) = exp\ n 



70 



L„ 70 + —=v - L„(7o 



n 



:exp|s„(?;) - + i?„(7o,u)|. 



where Tin{v) is linear in S„(?;) — > A^(0, ||v|P) and plim„^oo^n(70) ''^) = 
(both limits are under the true probability measure -Byo). To avoid the 
"super-efhciency" phenomenon, certain regularity conditions on the esti- 
mates are required. In estimating a smooth functional in the infinite-dimensional 
case, Wong [(1992), page 58] defines the class of pathwise regular estimates. 
An estimate Tn(Yi, . . . , Yn) of p{'jo) is pathwise regular if for any real number 
rj > and any v in the linear span of T — {70}, we have 

limSUpP^„^(r„ < p{jn,v)) < lipinf ^7n.-J^n. < p(7n,~r,)), 

where ^n,ri = 7o + i]n~^^'^v [see Wong (1992) and Shen (1997) for details]. 

Theorem 4.2 (Efficiency). Under conditions in Theorem 4-1, we have 
LAN, and the plug-in sieve MLE p{'yn) which achieves the efficiency lower 
bound for pathwise regular estimates. 



4.3. ^/n normality and efficiency of sieve MLE of copula parameter. We 
take p{'y) = X'a for any arbitrarily fixed A G TZ'^ with < |A| < 00. It satisfies 
Assumption 4.4(2) with ^gy°^ [v] = X'va and u; = 00. Assumption 4.4(1) is 
equivalent to finding a Riesz representer v* £Y satisfying (4.2) and (4.3), 



(4.2) 
and 

(4.3) 



X'{a — ao) = (7 — 70i ^*) for any 7 — 7* G V 



•9^(70) 



* ||2 



{v ,v 



sup 



< 00. 



Let us change the variables before making statements on (4.3). Denote 
^2([0,1]) = |e:[0,l] ^7^: ^ e{v)dv = 0,J^ [e{v)f dv < 00^ 
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By changing variables, for any Vg G Vg, there is a unique function bg G 
/^2([0,1]) with bg{u) = Vg{GQ^ (u)) / go{GQ^ (u)) , and vice versa. So we can 
express ^^^gy^*^ [v] as 



di 



de{-fo,Ut,Ut-i] 



d\ogc{Ut~i,Ut;ao) 



da' 



Va\+hg{Ut) 



+ E 

5=1 



6g (n) dtt 



and 



-E/fi 



/ (?logc(C/t_i,?7t;a;o) 



da' 



[{V'a.bg)'] 

Va\+hg{Ut) 



2 

+ E 



2n 



Define 



^ = \b={v'^,hg)'(^{A-aQ)xCl{%l]):\ 



En 



di 



< 00 



Then there is a one-to-one onto mapping between the two Hilbert spaces 
(B, II • II) and (V, || • ||). So the Riesz representer v* = {v*J^,v*)' G V is uniquely 
determined by b* = {v*J^,b*g)' G B (and vice versa) via the relation v*g{y) = 
b*g{Go{y))go{y) for all y£y. Notice that 



/„, |2 



sup 



sup lA'vap 



En 



d\ogc{Ut-i,Ut;ao) 



da' 



:«a] + bg{Ut) 



■E 



91ogc([/t-i,^t;ao) 
duj 
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fUt-2+j ^ 

X / bg (n) du 

Jo J 

where Sag is the efficient score function for oq, 

, dlogc{ao,Ut,Ut-i) 

^5.0= Q^, 

OLOgc{ao,ut,ut~i) 

10 



(4.4) 



E 9^ L -(-)du 



and e* = (e^, . . . , e^) G (£2([0) 1]))'^ solves the following infinite-dimensional 
optimization problems for k = 1, . . . ,d, 



inf Ei 



dlogc{Ut-i,Ut;ao) 



e;,e£0([o,i]) 1 \ dak 



2 



dlog c{Ut-i,Ut;ao 




Therefore, b* = {v*fl,b*g)' with f* =T=i<(ao)~"'^'^ and b*g{u) = —e*{u) x v*, and 
= [Id, — e*(Go(-))go(")] ^ 2r^,(ao)~^A. Hence (4.3) is satisfied if and only 
if X*(ao) = E^lSagS'^^ is nonsingular, which in turn is satisfied under the 
following assumption: 



Assumption 4.4'. (1) J^^^^^du^j = ^ J c{u;ao)du-j = for {j, 

dlogc{Ut-i,Ut;ao) f dlogc{Ut^i,Ut;ao) ^ 
da 

d^c{u;ao) , ^ _ 8^ 



-j) = (1,2) with j / -j; (2) Sideal = £:o( ""^'^^T." '^7 1 ''^T." ^^°°^ }0 
is finite and positive definite; (3) / ^ q^'q°^ du^j = q^q^ J c{u; oq) du^j = 



for = (1,2) with j 7^ — j; (4) there exists a constant K such that 

max,=i,2Supo<.^.<i E[{u,{l - %) = u,] < K. 

Assumption 4.4' is a sufficient condition to ensure that the copula param- 
eter can be estimated at a root-n parametric rate. It is imposed in Bickel 
et al. (1993) and Chen, Fan and Tsyrennikov (2006) for semiparametric bi- 
variate copula models. Bickel et al. (1993) has shown that many popular 
copula functions such as Clayton, Gaussian, Gumbel, Frank and others all 
satisfy this assumption. We can now apply Theorems 4.1 and 4.2 to obtain 
the following result: 

Proposition 4.1. Suppose that Assumptions M, 3.1-3.4, 4.1^4.3, 4.4 
and 4- 5-4-'^ hold. Then y/n{an — ao) ^ N{Q,T^{aQ)~'^), and (in is semipara- 
metrically efficient. 
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In general, there is no closed-form solution of T=i,(ao). Nevertheless, it can 
be consistently estimated by a sieve least squares method using its charac- 
terization in (4.4). Let Ut = Gn(Yt) for t = 1, . . . ,n. Let B„ be some sieve 
space such as 

e{u) = ^ akV2cos{kiru),ue [0, 1], ^ < oo L 
k=l k=l J 

where Kna — > oo, {KnaY /n — > 0. For k = 1, . . . ,d, we compute as the so- 
lution to 

mm —^Y.[ ^^(U,) 



g d^, Jo ''^^"^^"j • 



Denote e = (ei , . . . , e^) and 

^ 1 [dlogc{Ut-i,Ut;a) _p 
^*-^2_. ^i^t) 



t=2 



j^diogciu^..A-,a) ; 
fr{ du, Jo ' ' J 

' d log c{Ut-i,Ut; a) ^ - 
^ d^' 

2 



Following the proof of Theorem 5.1 in Ai and Chen (2003) we immediately 
obtain the following: 

Proposition 4.2. Under all of the assumptions of Proposition 4.1, 1* = 
T*(ao) + Op(l). 

4.4. Sieve MLE of the marginal distribution. Let us consider the esti- 
mation of p(7o) = Go{y) for some fixed y € 3^ by the plug- in sieve MLE, 
p{ln) = Gn{y) = Jl{x< y)gn{x) dx, where gn is the sieve MLE for 50- 

Clearly, ^gy"** [v] = Jy l{x < y)vg{x) dx for any v = {v'^^Vg)' G V. It is easy 
to see that w = 00 in Assumption 4.4, and 
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Hence the representer i;* G V should satisfy (4.6) and (4.7), 
(4.6) {v*,v) = ^^^[v]=Eo{l{Yt<y)^^4Sl) for ah t> G V 



(4.7) 



„.* Il2 



u*\\2 



sup 

f)GB:|16|l>0 



\E^[l(Ut<Go{y))hg(UtW 



Proposition 4.3. Let G V solve (4.6) and (4.7). Suppose that As- 
sumptions M, 3. 1-3. 4, 4-l~4-3 and 4.5-4-7 hold. Then for any fixed y Gy, 
^/n{Gn{y) — Go{y)) ^ N{0,\\v*\\'^). Moreover, Gn is semiparametrically ef- 
ficient. 

Again, there are currently no closed-form expressions for the asymptotic 
variance Nevertheless, it can also be consistently estimated by the 

sieve method. Let 



G ' 



max 



-Y.^{Ut<Gn{y)}hg{Ut 
n f-f 



t=i 



n 



t=2 



dlog c{Ut-i,Ut; a) 
da' 



Vol + hg{Ut) 



2 

+ E 



d\ogc{Ut-i,Ut;a)_ rUt-2+, 
Jo 



1 2\ 



du 



'J 



hg{u) du 



where Ut = Gn{Yt), and B„ is given in (4.5). 



Proposition 4.4. Under all the assumptions of Proposition 4-3, we 
have, for any fixed y , 



\v*f + oJl). 



4.5. Plug-in estimates of conditional quantiles. Under Assumption M, 
the gth conditional quantile of Yt given Yt-i = y is given by Q^{y) = 

^0 ^(^2|i['^ 1^0(2/) ; cto])- Its plug-in sieve MLE estimate is given by 

Q^,{y) = G-\C~^l[q\Gn{y);an]). 
Let /3(7o) = Q^{y), then by some calculation, for any v = {va^Vg)' G V, 



di 



-Cii J l{x < y)vg{x) dx - CiaVg 
c{Ut-i,G^^{Ut-i,q;ao),ao) 



l{x < Q^{y))vg{x) dx ] /goiQUy)) 
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d^C{Ut-i,C-HUt-i,q;c.o),ao) , (J 
du'f 



a'^C{Ut-i,C^\Ut-i,q;ao),ao) 
duida 



2 in Assumption 4.4, as long as go{QY {y)) / and 



where Cn 

We can see to 

c{Ut^i, C^^ {Ut-i, q; ao)) cto) / 0) which are satisfied under Assumption M(2). 
Thus we have 

5p(7o)|' 



sup 

veV:\\v\\>0 



{goiQUy))} 



-Cii / 1(2; < y)Vg{x) dx - CiaVa 



c{Ut-i,C^ ^{Ut-i,q;ao),ao) 



l{x < Qg {y))vg{x) dx 



< 00. 



Hence the Riesz representer G V should satisfy: {v* , v) = ^^gy^ [v] for all 

w G V, and \\v*\\'^ = \\ ^gy"^ |p- Applying Theorems 4.1 and 4.2 we immedi- 
ately obtain the following: 



Proposition 4.5. Let v* £V be the Riesz representer for Qq{y). Sup- 
pose that Assumptions M, 3.1-3.4, 4-i^4-3 and 4-5-4-7 hold. Then for a 



parametrically efficient. 



fixed y£y, Vn{Qq (y) 



N{0, 



Moreover, (y) is semi- 



5. Sieve likelihood ratio inference for smooth functionals. In this sec- 
tion, we are interested in the sieve likelihood ratio inference for smooth 
functional ^(7) = (pi (7) , . . . , (7) )' : T ^ , 

^^0 : p(7o) = 0, 

where p is a vector of known functionals. [For instance, p{'^) = q — ao G 
TZ'^ or /9(7) = G{y) — Go{y) G TZ for fixed y.] Without loss of generality, we 

assume that ^^gy , . . . , ^^gy °^ are linearly independent. Otherwise a linear 
transformation can be conducted for the hypothesis. 

Suppose that pi satisfies Assumption 4.4 for i = 1, . . . ,k. Then by the 
Riesz representation theorem, there exists a G V such that 



di 



{v*,v) 



for all V G V. 



Denote v* = {vf, . . . By the Gram-Schmidt orthogonalization, without 
loss of generality, we assume {v*,Vj) = for any i / j. 

Shen and Shi (2005) provide a theory on the sieve likelihood ratio inference 
for i.i.d. data. We now extend their result to strictly stationary Markov time 
series data. Denote 



7n = arg max Ln{a,g); 



7n = arg max Ln{a,g). 



22 



X. CHEN, W. B. WU AND Y. YI 



Theorem 5.1. Suppose that Assumptions M, 3.1-3.4, 4-l~4-3 and 4-5- 
4.7 hold, also that Assumption 4-4 holds with pi, i = 1, . . . ,k, and Assump- 
tion 4-5(2) holds with v* , i = 1, . . . ,k. Then 

2n(L„(7„)-L„(7j)^'^A'(|), 

where X^^,-^ stands for the chi-square distribution with k degrees of freedom, 
and ^^gy°^ ) • ■ • ) ^^gy °^ are assumed to be linearly independent. 

We can apply Theorem 5.1 to construct confidence regions of any smooth 
functionals. For example, we can compute confidence region for sieve MLE 
of the copula parameter a. Define 5n(a) = argmaxggg„ Ln{a,g). By Theo- 
rem 5.1, 2n{Ln{an,gn{an)) - L„(ao,5n(ao))) ^{d) where (S„,^„(S„)) = 
7n is the original sieve MLE."^ 

6. Monte Carlo comparison of several estimators. In this section, we 
address the finite sample performance of sieve MLE by comparing it to 
several existing popular estimators: the two-step semiparametric estimator 
proposed in Chen and Fan (2006), the ideal (or infeasible) MLE, the correctly 
specified parametric MLE and the misspecified parametric MLE. 

6.1. Existing estimators. For comparison, we briefly review several ex- 
isting estimators that have been used in applied work. 

6.1.1. Two-step semiparametric estimator. Chen and Fan (2006) pro- 
pose the following two-step semiparametric procedure: 

Step 1. Estimate the unknown true marginal distribution G(){y) by the 
empirical distribution function ^^^Gn{y) where Gn{y) = ^^Z]r=il{^t ^ 

y}- 

Step 2. Estimate the copula dependence parameter ao by 

1 

al'P = arg max - V log c(G„ {Yt^i),Gn {Yt );a). 

Assuming that the process {IfjJL^ is /3-mixing with a certain decay rate, 
under Assumption M and some other mild regularity conditions, Chen and 
Fan (2006) show that 

Vn{al^^ - ao) ^d N{0, crfsp), with cr|^p = Bq'^J:2spB^'^ , 



we only care about estimation and inference of copula parameter a, we could also 
extend the results of Murphy and van der Vaart (2000) on profile likelihood ratio to our 
copula-based semiparametric Markov models. 
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where Bo = -So(^-^^^4S^#^) = Sjdcai (under Assumption 4.4'), and 



I 1 

E2sp= lim Varo< -^Y] 



"^1=2 



< oo, 

r-1 r\ 



da 



Jo Jo oaoul 



Example 6.1 (Two-step semiparametric estimator of Gaussian copula 
parameter). The bivariate Gaussian copula is C{ui,U2;a) = ^a{^~^{ui), 
^~^{u2)) for \a\ < 1 where is the bivariate standard normal distribution 
with correlation a, and <1> is the scalar standard normal distribution. Chen 
and Fan (2006) show that 

Klaassen and Wellner (1997) establish that the semiparametric efficient 
variance bound for estimating a Gaussian copula parameter a is 1 — a^; 
hence a'^^P is semiparametrically efhcient for a Gaussian copula. However, 
as pointed out by Genest and Werker (2002), the Gaussian copula and the 
independence copula are the only two copulas for which the two-step semi- 
parametric estimator is efficient for uq. Moreover, the empirical CDF esti- 
mator is still inefficient for Go(-), even in this Gaussian copula-based Markov 
model. 



6.1.2. Possibly misspecified parametric MLE. Denote G{y,9)[g{y,0)] as 
the marginal distribution (marginal density) whose functional form is known 
up to the unknown finite-dimensional parameter 0. Then the observed joint 
parametric log-likelihood for is 



-in 1 " 

Ln{a, e) = -Y^ \og g{Yue) + - ^ log c{G{Yt.i, 9), G{Yt., 



e);a), 



and the parametric MLE is [a^^QP^) = argmax(Q 5i)g_4x0 -^n(a, ^)where ^ x 
is the parameter space. Under Assumption M and some other mild regularity 
conditions, we have 

V^{{al,ei) - {a\e*)) N{0, B-;^,pB-^'), 
where B^p = --E'o(§(^^^^|y) is nonsingular and = lim„_^oo Var{^ 

d{a,e) 



m{a* ,e* ,Zt) 

2^t=2 d(a.9) !■ 
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6.1.3. Efficiency of correctly specified parametric MLE. Asymptotic prop- 
erties for correctly specified MLEs for Markov processes have been dis- 
cussed in Section 10.4 of Joe (1997) and Billingsley (1961b). Under As- 
sumption M and the correct specification of marginal GiYt^O*) = Go(i^), we 

have a* = ao, B,, = S.^ = Sop = i5^o(^^^§^5j^{^^^§^Sj^}'), and {a^, t) 
is -^/n-efhcient for (ao,9*) with asymptotic variance 5^op . Moreover, ^/n{ce^ — 
ao) N{0,I^p{aoy-^) with 

T.,(ao)-mmi?o(^(^ b 

^ \ da dB 

6.1.4. Ideal (or infeasible) MLE. We denote ajf '^^i as the ideal (or infea- 
sible) MLE of the copula parameter oq when the marginal Gq{-) is assumed 
to be completely known. Let a]^'^'^^ = argmaXa^A ^YA=2^'^Sc{Ut-i,Ut;a). 
Suppose that Assumption M holds with a completely known G{-,9) = Go(-)- 
Then Bo = —Eo{ ^ ^daOa' '"° ^ ) = Sideai is finite and nonsingular and 
^Hcai jg gfficignt, thus 

V^(a:f-i-ao)-,iV(0,Sr,i^i). 

Remark 6.1. Since X*(ao) < X*p(q;o) < Sideah we have I^:{ao)^^ > 
2^*p(cko) ^ ^ ^ideai- ^Iso, Proposition 4.1 immediately implies that 
I^{ao)~^. 

Example 6.1' (The ideal MLE of a Gaussian copula parameter). For 
the Gaussian copula in Example 6.1, it is easy to verify that 

V R ^ f d'^logc{Ut^i,Ut;ao) \ l + a§ . 

SMcal = Bo = -Eo^ j = ^^-^ < oo If ao ^ 1. 



Consequently, ^^(ajf^^i - oq) A^(0, ^^^L) with S^r^Li = (1 " Oq. 



Z\ ^ 



We note that the asymptotic variance Avar(ajj^'^'^^) = S^^^^^j < 1 — ctg = Avar(a^'^^) 
and Avar(a^^°'^^) = Avar(Q^'^P) if and only if ao = (i.e., independent cop- 
ula). Also Avar(a^^'^'^') is decreasing in |ao|. 

Example 2.1' (The ideal MLE of a Clayton copula parameter). For the 
Clayton copula in Example 2.1, after some tedious calculation, we have 

^ „ 1 1 (l + a)(l + 2a) ^ , , 

a(l-l-a) a(l -I- a)^(l -I- 2a) a" 
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where Int(a) = xy{iogx ^°s^^ 'j^jl+^i/a — dx dy which is a small 

number bounded in [—1, 1]. Therefore, Sideai S (0,oo), provided that > 0. 
Hence ^/n{a]!^°^^ — ao) — >rf A^(0, where the asymptotic variance S^j^^j 

is increasing in ao and is O(aQ). 

Example 2.1' (The ideal MLE of EFGM copula parameter). For the 
EFGM copula with C{ui,U2]a) = ^1^2(1 + a(l — — '"2)), a G !]> 
the copula density function is 

c(ui,U2;a) = - — - — C(ui,U2;a) = 1 + a - 2a(ui + U2) + AauiU2. 

OUi OU2 

Let Li2{z) = J2'k^=i / k"^ J kl — be the polylogarithm function with order 
2. Then 



^ ideal — " -^0 



(9^ logc{Ut^i,Ut;ao) 



da da 

1 '■1 (1 -2«i -2u2 + 4miM2)^ 



dui du2 



Jo 1 + a — 2a{ui + U2) + 4auiti2 
°° a2fc-2 Li2{\a\)-Li2{a^)/4-\a 



T- 

(1 



;(l + 2fc)2 

6.2. Simulations. One can simulate a strictly stationary first-order Markov 
process from a specified bivariate copula C(ni,ii2;ao) with given in- 

variant CDF Go as follows: 

Step 1. Generate an i.i.d. sequence of uniform random variables {14}"=i- 

Step 2. Set Ui = Vi and Ut = C-^l[Vt\Ut-i,ao]. 

Step 3. Set Yt = G^^{Ut) for t = 1, . . . , n. 

In our simulation study, we consider several first-order Markov models 
generated via different classes of copulas (Clayton, Gumbel, Frank, Gaus- 
sian and EFGM), with either Student ^3 or t^ marginal distribution. Thus 
the true marginal distribution is Go = tiy with density goiu) = ^^\^^r{u/^2) ~'~ 
y_^~o.5{u+i) -^^^j^ degrees of freedom u = 3 or 5. For each specified copula 
G(ni, n2; ao), we generate a long time series, but we delete the first 2000 ob- 
servations and keep the last 1000 observations as our simulated data sample 
data {Yt} (i.e., a simulated sample size n = 1000). 

For all the copula-based Markov models and for each simulated sample, 
we compute five estimators of ao: sieve MLEs, ideal (or infeasible) MLEs, 
two-step estimators, correctly specified parametric MLEs (when the func- 
tional form of g is correctly specified) and misspecified parametric MLEs 
(when the functional form of g is misspecified). Sieve MLEs are computed 
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by maximizing the joint log-likelihood Ln{a,g) in (3.1) using either a poly- 
nomial sieve or a polynomial spline sieve to approximate the log-marginal 
density (log 5). The selection of K, the number of the sieve terms, is based 
on the so-called small sample AIC of Burnham and Anderson (2002), K = 
argmax/^{L„(7n(i^)) — K/(n — K — 1)}, where 7n(-f^) is the sieve MLE of 
70 = (oOiSo) using K as the sieve number of terms. 

We compare the estimates of the copula parameter, and the estimates 
of 1/3 and 2/3 of the marginal quantiles in terms of Monte Carlo means, 
biases, variances, mean squared errors and confidence regions based on 1000 
Monte Carlo simulation runs. 

Brief summary of MC results. In the longer version posted on arXiv 
[Chen, Wu and Yi (2009)], we report all the simulation findings in detail. 
Here we only report a few Monte Carlo results for Clayton and Gumbel 
copula-based Markov models in Appendix B, and give a brief summary of the 
overall patterns. (1) Sieve MLEs of copula parameters always perform better 
than the two-step estimator in terms of bias and MSE, except for Gaussian 
copulas and EFGM copulas. For Gaussian copulas, we already explained (in 
Example 6.1) that both the sieve MLE and the two-step estimators are semi- 
parametrically efficient for the copula parameter with unknown marginal dis- 
tributions. For EFGM copulas, the distance between the EFGM copula func- 
tion and the independent copula function is aMiM2(l — ui){^ — U2) < 0.0625a 
for a G [—1, 1]. Therefore, the EFGM copula is very close to the independent 
copula; hence the performance of the sieve MLE, the two-step, the correctly 
specified parametric MLE and the ideal MLE for copula parameters are 
all very close to one another; (2) For all the copula-based Markov models 
with some dependence in terms of Kendall's r 7^ 0, including Gaussian and 
EFGM copula-based Markov models, sieve MLEs of marginal distributions 
always perform better than the empirical CDFs in terms of bias and MSE; 
(3) For Markov models generated via strong tail dependent copulas, both 
the two-step-based estimators of copula parameters and the empirical CDF 
estimator of the marginal distribution perform very poorly, both having big 
biases and big MSEs; (4) Sieve MLEs perform very well even for copulas 
with strong tail dependence and fat-tailed marginal density ts; (5) Extreme 
conditional quantiles estimated via sieve MLEs are much more precise than 
those estimated via two-step estimators; (6) Misspecified parametric MLEs 
could lead to inconsistent estimation of copula parameters (in addition to 
inconsistent estimation of marginal density parameters). In summary we 
recommend sieve MLEs to estimate copula-based Markov models and their 
implied conditional quantiles (VaRs). 

7. Conclusions. In this paper, we first show that several widely used tail 
dependent copula-generated Markov models are in fact geometrically ergodic 
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(hence geometrically /3- mixing), although their time series plots may look 
highly persistent and "long memory alike." We then propose sieve MLEs 
for the class of first-order strictly stationary copula-based semiparametric 
Markov models that are characterized by the parametric copula parameter 
ao and the unknown invariant density go{-). We show that the sieve MLEs of 
any smooth functional of (aojffo) is root-n consistent, asymptotically normal 
and efficient, and that their sieve likelihood ratio statistics are asymptoti- 
cally chi-square distributed. We propose either consistent plug-in estimation 
of the asymptotic variance or inverting the sieve likelihood ratio statistics 
to construct confidence regions for the sieve MLEs. Monte Carlo studies 
indicate that, even for semiparametric Markov models generated via tail de- 
pendent copulas with fat-tailed marginal distributions, the sieve MLEs of 
the copula parameter, the marginal CDFs and the conditional quantiles all 
perform very well in finite samples. 

In this paper, we assume that the parametric copula function is correctly 
specified. We could test this assumption by performing a sieve likelihood 
ratio test [see e.g.. Fan and Jiang (2007) for a review about generalized like- 
lihood ratio tests]. Alternatively, we could also consider a joint sieve ML es- 
timation of nonpar ametric copulas and nonparametric marginals. Recently, 
Chen, Peng and Zhao (2009) provided an empirical likelihood estimation of 
nonparametric copulas using a bivariate random sample; their method could 
be extended to our time series setting. 



We first recall some equivalent definitions of /3-mixing and ergodicity for 
strictly stationary Markov processes. Then we present the drift criterion for 
geometric ergodicity of Markov chains. 

Definition A.l. (1) [Davydov (1973)] For a strictly stationary Markov 
process {Yt}^^, the /3-mixing coefficients are given by 



The process {Yt} is /3-mixing if lim(_^oo Pt = 0; and it is geometric /3-mixing 
if Pt < 7exp(— 5t) for some 5, 7 > 0. 

(2) [Chan and Tong (2001)] A strictly stationary Markov process {Yt} is 
(Harris) ergodic if 

lim sup \E[cl){Yt+i)\Yi = y]- E[(l){Yt+i)]\ = for almost ah y; 

t-»ooo<^<l 

and it is geometrically ergodic if there existsa measurable function W with 
/ W{y) (iGo(y) < 00 and a constant k G [0, 1) such that for all t>l, 



APPENDIX A: MATHEMATICAL PROOFS 




(A.l) 



sup \E[^{Yt+,)\Yi = y]- E[^{Yt+,)]\ < t,'W{y). 



0<<p<l 
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Definition A. 2. Let {Yt} be an irreducible Markov chain with a transi- 
tion measure P^{y; A) = P(Yj_|_„ G A\Yt = y), n>l. A nonnuh set S is caUed 
smah if there exists a positive integer n, a constant 6 > and a probabihty 
measure such that P^{y;A) > bi'{A) for all y E 5 and all measurable set 
^s. 

Theorem A.l [Theorem B.1.4 in Chan and Tong (2001)]. Let {Yt} be 
an irreducible and aperiodic Markov chain. Suppose there exists a small set 
S, a nonnegative measurable function L which is bounded away from and 
oo on S, and constants r > 1, 'j > 0, K > such that 

(A.2) rE[L{Yt+i)\Yt = y]<L{y)-j forally^S, 

and, let S' be the complement of S, 

(A.3) / L{w)P{y, dw) < K for all y £ S. 

Js' 

Then {Yt} is geometrically ergodic and (A.l) holds. Here L is called the 
Lyapunov function. 

Proof of Theorem 2.1. We establish the results by applying Theo- 
rem A.l or applying Proposition 2.1(i) of Chen and Fan (2006). 

(1) For a Clayton copula, let be a stationary Markov process of or- 

der 1 generated from a bivariate Clayton copula and a marginal CDF Go(-)- 
Then the transformed process {Ut = Go(^)}"=i has uniform marginals and a 
Clayton copula joint distribution of {Ut-i, Ut) ■ When a = 0, the Clayton cop- 
ula becomes the independence copula; hence the process {Ut = Go(yt)}fLi 
is i.i.d. and trivially geometrically ergodic. 

Let a > 0. Recall that C2|i[u'|ii; a] = -^C{u,w;a) = {u~°' + w~°' — 
iyi^i/a^^i~a ^1^^^ C~^l[q\u;ao] = [(g-"/(i+") - 1)^-" + 1]-V« ig the 

g'th conditional quantile of Ut given Ut^i = u. Denote Xt = U^^. Let {VfjJLj^ 
be a sequence of i.i.d. uniform(0, 1) random variables such that Vt is inde- 
pendent of Ut~i. Let q = Vt \n the above conditional quantile expression of 
Ut given Ut-i, then we obtain the following nonlinear AR(1) model from 
the Clayton copula: 

Xt = (y-"/(i+") _ l)Xt„i + 1 with = Ut^ uniform(0, 1). 

Note that the state space of {Xt} is (1, oo). Since Eq[{V^'^'^^^'^'^ - 1) V"] = l, 
we can let p G (0, 1/a), and L{x) = > 1 be the Lyapunov function. Then 
by Holder's inequality, p = EQ[L{Vt~°''''^''°''' - 1)] < 1. Let r = p^^/^ > i 

xo = max{x > 1 : rEo\\x{Vt~'^'^^^'^'^ - 1) + 1|^] > - 1}. 
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Such xq always exists since 

lim ' — = rp = p^i^ < 1. 

x—>oo xP — 1 

Let set S = [l,xo]. Clearly, L is bounded away from and oo on S. We now 
show that 5 is a small set. Let f{-\x) be the conditional density function of 
Xi given Xq = x. Then 

, , (l + a)xi+i/" ^ 1 + a 



a(y - 1 + a;)2+i/a - Q,(y _ i + xo)2+Vf* 

if x G 5". Choose the probability measure v on (l,oo) as v{dy) = /(2/|xo) dy. 
Then 

Pr(Xi G A\Xq = x)> ^^^/"z^(^) for all x e S and AeB. 
Hence 5 is a small set; see Definition A. 2. Notice that, by the definition of 

rEo[L{Xi)\XQ = x]< L{x) - 1 for all x > xq, 
Eo[L{Xi)\Xo =x] < oo for a\l x £ S = [l,xo]. 

Thus all of the conditions in Theorem A.l are satisfied; hence {Xt}f^i is 
geometrically ergodic, and geometric /3-mixing. 

(2) For the Gumbel copula, let be a stationary Markov pro- 

cess of order 1 generated from a bivariate Gumbel copula and a marginal 
CDF Go(-). Then the transformed process {Ut = Go(yj)}"=i has uniform 
marginals and {Ut-i,Ut) has the Gumbel copula joint distribution (see Ex- 
ample 2.2). When a = 1, the Gumbel copula becomes the independence cop- 
ula; hence the process {Ut = Go(5^)}"=i is i.i.d. and trivially geometrically 
ergodic. 

Let a > L Let Xt = (- log C/f Then Ut = F{Xt), with F{x) = exp{-xi/"}. 
Let f{x) = Q!"^^^/'^"^ exp{— x^/"}. Then for Xt we have 

Pr(At+i > X2 At = xi) = — - — - — , xi,X2>0. 

fixi) 



Hence 

f{xi + X2) 



roo PC 

Eo{Xt+i\Xt = xi)= / Fr{Xt+i>X2\Xt = xi)dx2 = 

Jo Jo 



fixi) 



dX2 



F{xi) l-(l/a) 

f{xi) 
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Note that as xi — > 0, 



a; 



l-l/(2a) /•°°^-l/(2a) -/(^l+^l^) 







-1/(2") (1 _ t-l/(2") (1 _ i)-l/(2") 







where the last relation is due to 



lim + ^^^^ X XI = (1 - l/a)(l + u)V"-2. 



Observe that, as a > 1, 



^ (1 _ i/a) [' t- V(2") (1 _ tyi/(2a) 



= (1 - X B{1 - l/(2a), 1 - l/(2a)) < 1, 

where •) is the beta function. 

Let L{x) = x^i/^^"^ + X be the Lyapunov function. Let r = inf2;>o L{x)/2. 
Then 

E,{L{Xt+,)\Xt = x) _^ 

and 

Inn ^"'^'^'^''l^' = 

Let S = [1/A, A] with sufficiently large A > 0. Then S is a small set. So 
all conditions in Theorem A.l are satisfied; hence {Xt}i^i is geometrically 
ergodic and geometrically /3-mixing. 

(3) For Student's t copula, let {ItltLi be a stationary Markov process 
of order 1 generated from a bivariate t-copula and a marginal CDF Go(")- 
Then the transformed process {Ut = Go(lt)}"=i satisfies the following: 



where et ~ ti^+i, and is independent of U^^^ = {Ut-i, • • • , Ui) [see, e.g., Chen, 
Koenker and Xiao (2008)]. Let Xt = t^'^{Ut). Then 



Xt = + a{Xt.i)eu a{Xt^i) = W^^±i|^(l - p2), 
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where et ~ tj^+i, and is independent oi X = {Xt-i, . . . ,Xi). Let L{x) = 

r(i— !-) 

I^l + 1 > 1 be the Lyapunov function. Then £'o{L(Xj)} = ^Jv ^y(u/2) + -'^ ^ 
oo provided that v >1. Then 

E^{L{Xt)\Xt_i =x)= EQ{\pXt_i + a{Xt.i)et\ \ Xt-i = x) + l 
= EQ{\px + a{x)et\) + l 

< Eo{\px + a{x)et\^) + 1 

= ^J{p^x'^ + a'^{x)Eom + l, 
where the strict inequahty is due to et ~ t^+i and for fixed x, 

< Yaiilpx + o-(x)etp) = E{\px + cj(x)etp) - [Eo{\px + aix)et\)f. 
Since a'^{x) = (1 - + x^)/{i^ + 1), we have 

^. E^{L{Xt)\Xt-i=x) 
lim — 

|a;|->oo 

EQ{\px + a{x)et\) + l 
= lim j — j 

\x\^oo \x\ + 1 

J{p^x^ + a\x)E,[el]) + l 
< lim — j — j 

l^l^oo \x\ + 1 

where the last inequahty is due to EQle^j/^v + 1) decreasing in G [2,00], 
and the last equality is due to E[t'^] = 3. Then we can choose a small 
set S = [—xq,xo] with sufficiently large xq > 0. Clearly the density of 
is bounded from above and below on a compact set. Hence, all conditions in 
Theorem A.l or in Proposition 2.1(i) of Chen and Fan (2006) are satisfied, 
and is geometrically ergodic (hence geometrically /3-mixing). □ 

Proof of Proposition 3.1. Since most of the conditions of consis- 
tency in Theorem 3.1 of Chen (2007) are already assumed in our Assump- 
tions M, 3.1 and 3.2, it suffices to verify Condition 3.5 (uniform convergence 
over sieves) of Chen (2007). Assumption M implies that {yjjJL]^ is stationary 
ergodic. This and Assumption 3.2 implies that Glivenko-Cantelli theorem 
for a stationary ergodic processes is applicable, and hence sup^gr„ l^nil) — 
E{Ln{'y)}\ = Op(l). The result now follows from Theorem 3.1 of Chen (2007). 
□ 
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Proof of Lemma 4.1. For (1), recall that Zt = (lj_i,Yt), under As- 
sumptions M, 3.1(1), (2), 4.1 and 4.2, we have, for all s <t, 



En 



V] E, 



V di ' 



Yt- 



Recall that the true conditional density function is p^{Yt\Y*' ^) = g^iYt) x 
c(Go(yj_i),Go(yj);ao) = /i(>^t|>t-i;7o). We have 



E< 



Zt 



Yt^ 



dh{yt\Yt-\\-fo) 

ay 

Hyt\Yt~i]^o) 

dh{yt\Yt-i;'yo) 
di 



v]h{yt\Yt-i;-fo)dyt 



[v] dyt 



d{J h{yt\Yt-i;-fo + r]v)dyt) 



dr] 



d{l) 



r]=0 



dr] 



0, 



r)=0 



where the order of differentiation and integration can be reversed due to 
Assumption 4.3. 

For (2), the above equality also implies that M}"=i is a martin- 

gale difference sequence with respect to the filtration J^t-i = (^{Yi; . . . ; Yt-i). 
For (3), Since / /i(y|lt_i;7o + rjv) dy = 1, by differentiating this equation 



with respect to rj twice and evaluating it at = 0, we get Eo{{ — M) M f-i) 

—Eq( ^ g'^Q^Y^^ [v,v]\ Yt- 1 ) where the interchange of differentiation and inte- 
gration is guaranteed by Assumption 4.3. □ 



Proof of Theorem 4.1. Let be any positive sequence satisfying 
e„ = o(n-V2). Dg^^^g ^[^^^^^ Zt] = £(7, Zt) - ^(70, ^t) - ^%^[7 " 7o] and 
fin{g{Zt))=n~^j:t=2[9{Zt) - Eog{Zt)]. In the proof we let gi) be £(7,-), 
r[7,7o,-] or '^^gy''^ [v*]- Then by the definition of the sieve MLE 7^ (with 
abuse of notation, we denote it as 7 in the following), 

1 " 

< - V[^(7, Zt) - m ± enIlnV*,Zt)] 



n 



t=2 



l,nm,Zt)-t{l±erJinV\Zt)) 
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+ £^0(^(7, Zt) - ^(7 ± enIlnv\Zt)) + Op(n^i) 

= Ten- ^^^?^^7^[n„?;*] + ;U„(r [7,70, Zt] - r[7 ± e„n„u*, 70, Z*]) 
n o'7 

+ EQ{r[^,-iQ,Zt] - r[7±e„n„u*, 70, Zf]) + o(n"^). 

Claim 1. ^Er=2 ^^^^§f^[IinV* - v*] = Op(n-V2). ^pj^-g ^^^-j^ -g ^^^^ 
to Chebyshev's inequality, serially uncorrelated (Lemma 4.1) and identically 
distributed data, and ||n„w* — ti*!! = o(l). 

Claim 2. /x„(r[7,7o, Zi] - r[7 ± enn„f*,7o, Zt]) = e„ x Op(n-^/2)_ ^his 
claim holds since 

Atn(r[7,7o, Zt] - r[7 ± enllnt;*, 70, Zt]) 

= ^in (^(7, ^t) - ^(7 ± enn„t;*, Zt) ± e„^^i^;^[n„i;*] 



where 7 G r„ lies between 7 and 7 ± en^nV* , and the last equality is implied 
by Assumption 4.7. 

Claim 3. £;o(r[7, 70, Zt] - ± enn^t;*, 70, Zt]) = ±e„(7 - -fo,v*) + 
Note that 

Eoir[j, 70, Zt]) = Eo (^(7, Zt) - £(70, Zt) - [7 - 7o] 

9^70, Zt) \ 
— [7-70,7-70] I 



57 97' 

/ 52^(70^ iV . r -l/2^ 

I 97^7^ b-To,7-7o]j+enXOp(n /) 



1 ^52^(70, Zt) 



+ e„ X Op(n"^/2) + Op(n"^), 
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where 7 € r„ is located between 7 and 70, and the last equality is due to 
Assumption 4.6. By Lemma 4.1(3), we have 

ll7-7of 



En 



A di 
Therefore, 

EQ{rYy,-io,Zt]-rYy± enMnV* , 70 , -^i ] ) 
II7 - 7o|P - II7 ± Cnlln?;* - 7o| 



[7-7o]j j=-^o(^ Q^Q^, [7-7o,7-7o]J- 



= ±e„(7 - 7o,n„i;*) + ^||e„n„t;*|p + Op(e„n~^/^) + Op(n"^) 

= ±e„ X (7- 70,?^*) +en X Op(n"^/^) + Op(n"^). 
In summary. Claims 1, 2 and 3 imply that 
1 " 

< - E[^(7- ^t) - ^(7 ± enHn^;*, ^t)] 

= T^n- E b*] ± e„ X (7 - 70, ^;*) + e„ X Op(n-i/2) + Op(n-i) 

d7' 

= Ten/in ( ^^^y^^ ± en X (7 - ^q, i;*) + e„ X 0^(^-1/2) + 0^(^-1). 

Thus we obtain 

V^(7 - 70, = V^Mn + o,{l) N{0, \\v* f), 

where the asymptotic normality is guaranteed by Billingsley's (1961a) er- 
godic stationary martingale difference CLT, and the asymptotic variance 
being equal to \\v*\\'^ = \\ |p is implied by Lemma 4.1(1) and the defini- 
tion of the Fisher norm 11 • 11. □ 



Proof of Theorem 4.2. Given our normality results in Theorem 4.1, 

2^ l^t=2\ gy" 



for our model we can take T,n{v) = -^J2t=2 ^^^ay^*^ I^]' which is linear in 
V and converges in distribution to iV(0, Hup), and ^Er=2(^%^H)^ 



^ll-yp + Op(l), and hence LAN holds. Notice that the proof in Wong (1992) 
allows for time series data, and following his proof, under LAN, we obtain 
that /?(7n) achieves the semiparametric efficiency bound. Alternatively, we 
can conclude that /o(7n) is semiparametrically efficient by applying the result 
of Bickel and Kwon (2001) which allows for strictly stationary semiparamet- 
ric Markov models. □ 
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Proof of Proposition 4.1. Thanks to Lemma 4.1, we can directly 
extend the results in Bickel et al. (1993) for bivariate copula models with 
i.i.d. data to our copula-based first-order Markov time series setting. So the 
semiparametric efficiency bound for oq is T^{ao) = EQ{SaoS'^^} , where Sao 
is the efficient score function for ao, which is defined as the ordinary score 
function for ao minus its population least squares orthogonal projection 
onto the closed linear span (clsp) of the score functions for the nuisance 
parameters go. And oq is y/n- efficiently estimable if and only if EQ{SaoSag} 
is nonsingular [see e.g. Bickel et al. (1993)]. Hence (4.3) is clearly a nec- 
essary condition for \/n-normality and efficiency of din for oq. Under As- 
sumptions 4.2, 4.3 and 4.4', Propositions 4.7.4 and 4.7.6 of Bickel et al. 
[(1993), pages 165-168] for bivariate copula models apply. Therefore, with 
Saa defined in (4.4), we have that I^{ao) = E'oi^ao^Qy} is finite, positive- 
definite. This implies that Assumption 4.4 is satisfied with ^(7) = X'a and 
Lo = oo and \\v*\\^ = ||^^^||^ = X%{aoy^X < 00. By Theorem 4.1, for any 

A G 7^'^, A / 0, we have \/n(A'a„ - A'ao) ^ AA(0, A%(ao)"^A). This implies 
Proposition 4.1. □ 

Proof of Theorem 5.1. The proof basically follows from that of Shen 
and Shi (2005), except for our definition of joint log-likelihood, our definition 
of Fisher norm || • ||, and our application of Billingsley's CLT for ergodic 
stationary martingale difference processes. These modifications are the same 
as those in our proof of Theorem 4.1. A detailed proof is omitted due to the 
length of the paper but is available upon request. □ 

APPENDIX B: TABLES AND FIGURES 

Different estimators: Sieve = Sieve MLE; Ideal = Ideal MLE; 2step = 
Chen-Fan; Para = correctly specified parametric MLE; Mis-N = parametric 
MLE using misspecified normal distribution as marginal; Mis-EV = parametric 
MLE using misspecified extreme value distribution as marginal. 

Results are all based on 1000 MC replications of estimates using n = 
1000 time series simulation. Bias^QS , Var;^o3 and MSE;^o3 a-re the true values 

Table 1 

Clayton copula, true marginal G — t^: estimation of a 







Sieve 


Ideal 


2step 


Para 


Mis-N 


Mis-EV 


Q = 2 


Mean 


1.969 


2.002 


1.912 


1.989 


2.400 


2.957 


r(0.500) 


Bias 


-0.031 


0.002 


-0.088 


-0.011 


0.400 


0.957 


A(0.707) 


Var 


0.019 


0.007 


0.101 


0.012 


0.103 


0.056 




MSB 


0.020 


0.007 


0.109 


0.012 


0.264 


0.971 



"(2 5,97.5) (1-70, 2.25) (1.83, 2.17) (1.36, 2.60) (1.76, 2.19) (1.99, 3.28) (2.57, 3.36) 
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Table 1 
( Continued) 







Sieve 


Ideal 


2step 


Para 


Mis-N 


Mis-EV 


a = 5 


Mean 


4.849 


5.003 


4.359 


4.979 


5.859 


5.923 


r(0.714) 


Bias 


-0.151 


0.003 


-0.642 


-0.021 


0.859 


0.923 


A(0.871) 


Var 


0.093 


0.026 


1.247 


0.041 


0.189 


0.338 




MSE 


0.116 


0.026 


1.658 


0.042 


0.927 


1.190 




MC 
"(2.5,97.5) 


(4.25, 5.48) (4.69, 5.32) (2.67, 7.12) (4.58, 5.35) (5.36, 6.95) (4.89, 6.62) 


Q= 10 


Mean 


9.687 


10.00 


7.115 


9.967 


11.42 


11.57 


r(0.833) 


Bias 


-0.313 


0.004 


-2.886 


-0.033 


1.425 


1.570 


A(0.933) 


Var 


0.351 


0.085 


4.852 


0.129 


0.577 


1.194 




MSE 


0.449 


0.085 


13.18 


0.130 


2.607 


3.659 




MC 
"(2.5,97.5) 


(8.68, 10.87) (9.43, 10.6) (3.87, 12.5) (9.26, 10.6) (10.33, 12.9) (9.68, 12.9) 


a = 12 


Mean 


11.62 


12.01 


7.896 


11.98 


13.67 


13.82 


r(0.857) 


Bias 


-0.382 


0.012 


-4.104 


-0.016 


1.668 


1.816 


A(0.944) 


Var 


0.541 


0.119 


5.656 


0.222 


0.770 


1.917 




MSE 


0.687 


0.120 


22.50 


0.222 


3.552 


5.214 




^MC 

"(2.5,97.5) 


(10.5, 13.3) (11.3, 12.7) (4.35, 13.6) (11.0, 12.9) (12.3, 15.7) (11.4, 15.4) 



of Bias^, Var and MSE multiplied by 1000 respectively, r = Kendall's r, 
A =lower tail dependence index. 
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Table 2 

Gumbel copula, true marginal G — t^: estimation of a 







Sieve 


Ideal 


2step 


Para 


Mis-N 


Mis-EV 


a = 2 




2.002 


1.999 


1.982 


1.992 


2.377 


1.864 






0.002 


—0.001 


-0.018 


-0.008 


0.377 


—0.136 




Var 


n nn? 

U.UU ( 


n nn9 

U.UU.i 


0.013 


0.005 


0.153 






MSE 


U.UU ( 


n nn9 

u.uu.^ 


0.014 


0.005 


0.295 








(1.85, 2.18) 


(1.91, 2.10) 


(1.78, 2.23) 


(1.85, 2.14) 


(1.99, 3.55) 


(1.60, 2.22) 


a = 3.5 


Mean 


3.486 


3.498 


3.352 


3.481 


3.906 


3.629 


r(0.714) 


Bias 


-0.014 


-0.002 


-0.148 
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0.406 


0.129 




Var 


0.064 


0.008 


0.130 


0.021 


0.269 
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MSE 


0.064 


0.008 


0.152 


0.021 


0.434 


0.331 




"(2.5,97.5) 


(3.06, 4.07) 


(3.34, 3.68) 


(2.76, 4.20) 


(3.21, 3.87) 


(3.21, 5.38) 


(2.73, 4.83) 


Q = 6 


Mean 
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5.998 


5.253 


5.971 


6.359 


6.8805 


r(0.833) 


Bias 
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-0.002 


-0.747 


-0.029 


0.359 


0.881 




Var 


0.320 


0.023 


0.676 


0.071 


0.396 


2.328 




MSE 


0.362 


0.023 


1.235 


0.072 


0.525 


3.103 




"(2.5,97.5) 


(4.67, 6.95) 


(5.72, 6.31) 


(3.92, 7.17) (5.47, 6.67 ) (5.20, 7.48) 


(4.32, 9.78) 


Q = 7 


Mean 


6.667 


6.997 


5.873 


6.971 


7.357 


8.257 


r(0.857) 


Bias 


-0.333 


-0.003 


-1.127 


-0.029 


0.357 


1.257 




Var 


0.456 


0.032 


0.968 


0.106 


0.506 


3.859 




MSE 


0.566 


0.032 


2.238 


0.107 


0.633 


5.438 




MC 
"(2.5,97.5) 


(5.34, 8.12) 


(6.67, 7.37) 


(4.23, 8.20) (6.34, 7.79) (6.01, 8.58) (4.96, 12.24) 
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Table 3 

Clayton copula, true marginal G — ta: estimation ofG. Reported Bias^ , Var and MSE 
are the true ones multiplied by 1000 

Sieve 2step Para Mis-N Mis-EV 



Ql/3 Q2/3 Ql/3 Q2/3 Ql/3 Q2/3 Ql/3 Q2/3 Ql/3 Q2/3 



a = 2 


Mean 


0, 


.325 


0. 


.673 


0, 


.333 


0, 


.666 


0. 


.333 


0, 


.667 


0. 


.347 





.557 


0, 


.382 


0, 


.614 


r(0.500) 


Bias^(,3 


0, 


.026 


0, 


.007 


0, 


.011 


0, 


.013 


0. 


.009 


0, 


.009 


0. 


.282 


12, 


.84 


2, 


.710 


3 


.145 


A(0.707) 


Vario3 


0, 


.054 


0, 


.049 


1, 


.430 


0, 


.801 


0. 


.002 


0, 


.002 


1. 


.921 


5, 


.651 


0, 


.755 


0, 


.947 




MSE103 


0, 


.080 


0, 


.056 


1, 


.441 


0, 


.814 


0. 


.011 


0. 


.011 


2. 


.203 


18, 


.49 


3 


.465 


4, 


.092 


Q = 5 


Mean 


0, 


.322 


0, 


.671 


0, 


.332 


0, 


.667 


0. 


.333 


0. 


.667 


0. 


.331 


0, 


.537 


0, 


.342 


0, 


.579 


r(0.714) 


Bias^QS 


0, 


.072 


0, 


.002 


0, 


.003 


0, 


.011 


0. 


.009 


0. 


.009 


0. 


.001 


17, 


.65 


0, 


.134 


8, 


.276 


A(0.871) 


Varjos 


0, 


.081 


0. 


.085 


6, 


.474 


2, 


.969 


0. 


.002 


0. 


.002 


1. 


.401 


5, 


.697 


2, 


.234 


5, 


.346 




MSE103 


0, 


.153 


0. 


.087 


6 


.478 


2, 


.980 


0. 


.011 


0. 


.011 


1. 


.403 


23, 


.35 


2, 


.369 


13 


.62 


Q= 10 


Mean 


0, 


.319 


0. 


.664 


0, 


.331 


0, 


.666 


0. 


.333 


0. 


.667 


0. 


.364 


0, 


.584 


0, 


.371 


0, 


.624 


r(0.833) 


Bias^(,3 


0, 


.128 


0. 


.042 


0, 


.001 


0, 


.013 


0. 


.009 


0. 


.009 


1. 


.132 


7, 


.452 


1, 


.642 


2, 


.123 


A(0.933) 


VarjQS 


0, 


.109 


0, 


.137 


22, 


.28 


9, 


.800 


0. 


.003 


0. 


.003 


0. 


.711 


3, 


.410 


2, 


.103 


4, 


.192 




MSE103 


0, 


.236 


0, 


.178 


22, 


.29 


9, 


.813 


0. 


.012 


0. 


.012 


1. 


.843 


10, 


.86 


3 


.744 


6, 


.315 


a = 12 


Mean 


0, 


.318 


0. 


.661 


0, 


.331 


0, 


.665 


0. 


.333 


0. 


.667 


0. 


.374 





.598 


0, 


.375 


0, 


.633 


r(0.857) 


Bias^os 


0, 


.154 


0, 


.079 


0, 


.001 


0, 


.023 


0. 


.010 


0. 


.010 


1. 


.903 


5, 


.242 


2, 


.052 


1, 


.351 


A(0.944) 


Vario3 


0, 


.127 


0, 


.141 


28, 


.83 


12, 


.08 


0. 


.003 


0. 


.003 


0. 


.950 


2, 


.662 


2, 


.494 


4, 


.934 




MSE103 


0, 


.281 


0. 


.220 


28, 


.83 


12, 


.10 


0. 


.013 


0. 


.013 


2. 


.853 


7, 


.904 


4, 


.547 


6, 


.286 



Table 4 

Gumbel copula, true marginal G — t^: estimation of G 



Sieve 2step Para Mis-N Mis-EV 







Ql/3 


Q2/3 


Ql/3 


Q2/3 


Ql/3 


Q2/3 


Q 


1/3 


Q 


2/3 


Q 


1/3 


Q 


2/3 


Q = 2 


Mean 


0.328 


0.673 


0, 


.333 


0.666 


0.333 


0, 


.667 


0, 


.401 


0, 


.613 


0, 


.519 


0. 


.737 


r(0.500) 


Biasjga 


0.004 


0.011 


0, 


.007 


0.018 


0.009 


0, 


.009 


5. 


.069 


3 


.239 


35, 


.53 


4. 


.456 




Varjos 


0.059 


0.063 


0. 


.755 


1.025 


0.003 


0, 


.003 


2. 


.389 


3 


.111 


10, 


.44 


7. 


.202 




MSE103 


0.063 


0.074 


0. 


.762 


1.043 


0.012 


0, 


.012 


7. 


.457 


6, 


.350 


45, 


.98 


11. 


,66 


a = 3.5 


Mean 


0.328 


0.675 


0. 


.332 


0.665 


0.333 


0, 


.667 


0, 


.524 


0, 


.719 


0, 


.565 


0. 


.746 


t(0.714) 


Bias^(,3 


0.004 


0.025 


0. 


.005 


0.030 


0.009 


0, 


.009 


37. 


.55 


2, 


.386 


55, 


.42 


5. 


,762 




Varjos 


0.139 


0.147 


2. 


.353 


3.482 


0.004 


0, 


.004 


18. 


.71 


9 


.238 


28, 


.40 


18. 


,12 




MSE103 


0.143 


0.171 


2. 


.358 


3.511 


0.013 


0, 


.013 


56. 


.26 


11, 


.62 


83, 


.82 


23. 


,88 


a — 6 


Mean 


0.325 


0.681 


0, 


.330 


0.664 


0.333 


0, 


.667 


0, 


.501 


0, 


.700 


0, 


.497 


0. 


.676 


r(0.833) 


BiaSjQ3 


0.025 


0.120 


0, 


.000 


0.036 


0.009 


0, 


.009 


29, 


.17 


0, 


.899 


27, 


.97 


0. 


.037 




Var^flS 


0.273 


0.255 


6, 


.840 


10.37 


0.005 


0, 


.005 


40, 


.49 


20, 


.60 


40, 


.98 


29. 


,81 




MSE103 


0.298 


0.375 


6, 


.840 


10.41 


0.014 


0, 


.014 


69, 


.66 


21, 


.50 


68, 


.96 


29. 


,84 


a = 7 


Mean 


0.324 


0.684 


0, 


.329 


0.665 


0.333 


0, 


.667 


0, 


.477 


0, 


.679 


0, 


.476 


0. 


.655 


t(0.857) 


Biasj(,3 


0.041 


0.182 


0, 


.000 


0.029 


0.009 


0, 


.009 


21, 


.46 


0, 


.076 


21, 


.35 


0. 


.227 




Vario3 


0.314 


0.275 


9, 


.362 


13.79 


0.006 


0, 


.006 


49, 


.51 


26, 


.89 


45, 


.82 


33. 


,93 




MSE103 


0.355 


0.457 


9, 


.362 


13.82 


0.016 


0, 


.016 


70, 


.97 


26, 


.96 


67, 


.16 


34. 


.16 
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Table 5 

Clayton copula, true marginal G — ts: estimation of 0. 01 conditional quantile 







Sieve 


Ideal 


2step 


Para 


Mis-N 


Mis-EV 


a = 5 


IntBiaSiQ3 


36.26 


0.000 


150.0 


0.172 


900.7 


704.8 


r(0.714) 


IntVario3 


32.15 


5.450 


985.3 


10.18 


463.7 


313.4 


A(0.871) 


IntMSEios 


68.41 


5.450 


1135 


10.35 


1364 


1018 


Q= 10 


IntBias^QS 


7.712 


0.000 


527.3 


0.040 


815.3 


427.4 


r(0.833) 


IntVarjos 


19.36 


2.475 


855.3 


3.716 


361.7 


202.7 


A(0.933) 


IntMSEio3 


27.07 


2.475 


1383 


3.756 


1177 


630.1 


a = 12 


IntBias^QS 


2.851 


0.000 


367.7 


0.004 


181.1 


175.9 


r(0.857) 


IntVarigs 


6.236 


1.068 


590.9 


1.578 


59.44 


46.12 


A(0.944) 


IntMSEios 


9.086 


1.069 


958.7 


1.582 


240.5 


222.0 



For each a, evaluation is based on the common support of 1000 MC simulated data. 
Reported integrated Bias^, integrated Var and the integrated MSB are the true ones 
multiphed by 1000. 




-3 -2 -1 1 2 3 4 5 6 -3 -2 -1 1 2 3 4 5 6 



true^solid, sieve^dashed, 2-step=dotted true^solid, parametric^dashed, misspecified normahdotted, misspecified EV=dash-dot 

(a) (b) 




-3 -2 -1 1 2 3 4 5 6 -3 -2 -1 1 2 3 4 5 6 



true=solid, sieve=dashed, 2-slep=dotted true=solid, parainetric=dashed, misspecified normal= dolled, misspecified EV=dash-dol 

(c) (d) 

Fig. 2. Clayton copula (true a = 10, marginal G — ts, ts ): estimation of 0.01 conditional 
quantile. Evaluation is based on the common support of 1000 MC simulated data. 



40 



X. CHEN, W. B. WU AND Y. YI 



Chen, J., Peng, L. and Zhao, Y. (2009). Empirical likelihood based confidence intervals 

for copulas. J. Multivariate Anal. 100 137-151. MR2460483 
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. Handbook 

of Econometrics (J. J. Heckman and E. E. Learner, eds.) 6B 5549-5632. North Holland, 

Amsterdam. 

Chen, X. and Fan, Y. (2006). Estimation of copula-based semiparametric time series 

models. J. Econometrics 130 307-335. MR2211797 
Chen, X., Fan, Y. and Tsyrennikov, V. (2006). Efficient estimation of semiparametric 

multivariate copula models. J. Amer. Statist. Assoc. 101 1228-1240. MR2328309 
Chen, X., Koenker, R. and Xiao, Z. (2008). Copula-based nonlinear quantile autore- 

gression. Econom. J. To appear. 
Chen, X. and Shen, X. (1998). Sieve extremum estimates for weakly dependent data. 

Econometrica 66 289-314. MR1612238 
Chen, X., Wu, W. B. and Yl, Y. (2009). Efficient estimation of copula-based semipara- 
metric Markov models. Available at http://arxiv.org/abs/0901.0751v3. 
Darsow, W., Nguyen, B. and Olsen, E. (1992). Copulas and Markov processes. Illinois 

J. Math. 36 600-642. MR1215798 
Davydov, Y. (1973). Mixing conditions for Markov chains. Theory Probab. Appl. 312-328. 
DE LA Pena, v., Ibragimov, R. and Sharakhmetov, S. (2006). Characterizations of 

joint distributions, copulas, information, dependence and decoupling, with applications 

to time series. In Optimality (J. Rojo, ed.) 183-209. IMS, Beachwood, OH. MR2337835 
Doukhan, p., Massart, P. and Rio, E. (1995). Invariance principles for absolutely regular 

empirical processes. Ann. Inst. H. Poincare Probab. Statist. 31 393-427. MR1324814 
Embrechts, p. (2009). Copulas: A personal view. Journal of Risk and Insurance. To 

appear. 

Embrechts, P., McNeil, A. and Straumann, D. (2002). Correlation and dependence 
properties in risk management: Properties and pitfalls. In Risk Management: Value at 
Risk and Beyond (M. Dempster, ed.) 176-223. Cambridge Univ. Press. MR1892190 

Fan, J. and Jiang, J. (2007). Nonparametric inference with generalized likelihood ratio 
test. Test 16 409-478. MR2365172 

Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Meth- 
ods. Springer, New York. MR1964455 

Gao, J. (2007). Nonlinear Time Series: Semiparametric and Nonparametric Methods. 
Chapman & HaU/CRC, London. MR2297190 

Genest, C, Gendron, M. and Bourdeau-Brien, M. (2008). The advent of copulas in 
finance. European Journal of Finance. To appear. 

Genest, C, Ghoudi, K. and Rivest, L. (1995). A semiparametric estimation procedure 
of dependence parameters in multivariate families of distributions. Biometrika 82 543- 
552. MR1366280 

Genest, C. and Werker, B. (2002). Conditions for the asymptotic semiparametric ef- 
ficiency of an omnibus estimator of dependence parameters in copula models. In Dis- 
tributions With Given Marginals and Statistical Modeling (C. M. Cuadras, J. Fortiana 
and J. Rodn'guez-Lallena, eds.) 103-112. Springer, New York. MR2058984 

Granger, C. W. J. (2003). Time series concepts for conditional distributions. Oxford 
Bulletin of Economics and Statistics 65 supplement 689-701. 

Ibragimov, R. (2009). Copulas-based characterizations and higher-order Markov pro- 
cesses. Econometric Theory. To appear. 

Ibragimov, R., and Lentzas, G. (2008). Copulas and long memory. Working paper, 
Harvard Univ. 



EFFICIENT ESTIMATION OF MARKOV MODELS 



41 



Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall/CRC. 
MR1462613 

Klaassen, C. and Wellner, J. (1997). Efficient estimation in the bivariate normal copula 
model: Normal margins are least favourable. Bernoulli 3 55-77. MR1466545 

KORTSCHAK, D. and Albrecher, H. (2009). Asymptotic results for the sum of depen- 
dent non-identically distributed random variables. Method. Comput. Appl. Probab. 11 
279306. DOI: 10. 1007/sl 1009-007-9053-3. 

Li, Q. and Racine, J. (2007). Nonparametric Econometrics. Princeton Univ. Press, NJ. 
MR2283034 

McNeil, A. J., Frey, R. and Embrechts, P. (2005). Quantitative Risk Management: 
Concepts, Techniques, and Tools. Princeton Univ. Press, Princeton, NJ. MR2175089 

Murphy, S. and van der Vaart, A. (2000). On profile likelihood. J. Amer. Statist. Assoc. 
95 449-465. MR1803168 

Nelsen, R. B. (2006). An Introduction to Copulas, 2nd ed. Springer, New York. 
MR2197664 

NzE, P. A. and Doukhan, P. (2004). Weak dependence: Models and applications to econo- 
metrics. Econometric Theory 20 995-1045. MR2101950 

Patton, a. (2002). Applications of copula theory in financial econometrics. J. Appl. 
Econometrics 21 147-173. 

Patton, A. (2006). Modelling asymmetric exchange rate dependence. International Eco- 
nomic Review 47 527-556. MR2216591 

Patton, A. (2008). Copula models for financial time series. In Handbook of Financial 
Time Series. Springer. 

Robinson, P. (1983). Non-parametric estimation for time series models. J. Time Ser. 
Anal. 4 185-208. 

Shen, X. (1997). On methods of sieves and penalization. Ann. Statist. 25 2555-2591. 
MR1604416 

Shen, X., Huang, H. and Ye, J. (2004). Inference after model selection. J. Amer. Statist. 

Assoc. 99 751-762. MR2090908 
Shen, X. and Shi, J. (2005). Sieve likelihood ratio inference on general parameter space. 

Science m China 48 67-78. MR2156616 
Wong, W. H, (1992). On asymptotic efficiency in estimation theory. Statist. Sinica 2 

47-68. MRl 152297 

X. Chen W. B. Wu 

CowLES Foundation for Research in Economics Department of Statistics 

Yale University University of Chicago 

30 HiLLHOUSE Ave., Box 208281 5734 S. University Ave. 

New Haven, Connecticut 06520 Chicago, Illinois 60637 

USA USA 

E-MAIL: xiaohong.chen@yale.edu E-MAIL: wbwu@galton.uchicago.odu 

Y. Yl 

Department of Economics 

New York University 

19 West 4th St. 

New York, New York 10012 

USA 

E-MAIL: yanping.yi@nyu.cdu 



